hiCLUMP : A hybrid Implementation of the CLUMP Algorithm for Clustering Microarrays Data
نویسنده
چکیده
Microarrays technology allows us to measure the expression level of hundreds of thousands of genes simultaneously. The microarrays data analysis process involves various heavy computational tasks such as clustering. The clustering can be defined as partitioning a dataset into groups where objects in the same group are similar in somehow. CLUMP (clustering through MST in parallel) is one of the minimum spanning tree (MST) -based clustering techniques. It employed a parallel approach to reduce the MST construction time. An enhanced version of CLUMP (iCLUMP) was proposed to further improve the MST construction phase using cover tree data structure. Despite that modification, the MST construction phase is still a bottleneck since it is a time consuming task. Both CLUMP and iCLUMP are based on a distributed parallel computing model. Therefore, the objective of this paper is to study a different approach of enhancement using a hybrid parallel model. The proposed algorithm; hiCLUMP (hybrid CLUMP), considers utilizing multithreading on some of the distributed partitions suggested by the CLUMP algorithm. The experimental results on six different microarrays datasets show that the load balancing strategy used in hiCLUMP succeeded to decrease the MST construction in a range between 8% and 17% on 36 processing node. Moreover, the results showed that the hiCLUMP could not outperform the iCLUMP emphasizing that using another data structure is more effective than increasing the processing power of the underlying parallel machine.
منابع مشابه
A parallel Clustering algorithm based on minimum spanning tree for microarrays data analysis
Clustering is partitioning a set of observation into groups called clusters, where the observation in the same group has a common characteristic. One of the best known algorithms for solving the microarrays data clustering problem using minimum spanning tree (MST) is CLUMP algorithm (Clustering algorithm through MST in Parallel) which identifies a dense clusters in a noisy background. The MST c...
متن کاملHybrid ANFIS with ant colony optimization algorithm for prediction of shear wave velocity from a carbonate reservoir in Iran
Shear wave velocity (Vs) data are key information for petrophysical, geophysical and geomechanical studies. Although compressional wave velocity (Vp) measurements exist in almost all wells, shear wave velocity is not recorded for most of elderly wells due to lack of technologic tools. Furthermore, measurement of shear wave velocity is to some extent costly. This study proposes a novel methodolo...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملGROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION
Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...
متن کامل